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Abstract 

The class of dual ^-divergence estimators (introduced in Broniatowski and Keziou 
(2009) [6]) is explored with respect to robustness through the influence function 
approach. For scale and location models, this class is investigated in terms of ro- 
bustness and asymptotic relative efficiency. Some hypothesis tests based on dual 
divergence criterions are proposed and their robustness properties are studied. The 
empirical performances of these estimators and tests are illustrated by Monte Carlo 
simulation for both noncontaminated and contaminated data. 
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1 Introduction 



Minimum divergence estimators and related methods have received consid- 
erable attention in statistical inference because of their ability to reconcile 
efficiency and robustness. Among others, Beran [3], Tamura and Boos |22j . 
Simpson [2"0f2~T] and Toma [23] proposed families of parametric estimators 
minimizing the Hellinger distance between a nonparametric estimator of the 
observations density and the model. They showed that those estimators are 
both asymptotically efficient and robust. Generalizing earlier work based on 
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the Hellinger distance, Lindsay [IT], Basu and Lindsay [2], Morales et al. [H] 
have investigated minimum divergence estimators, for both discrete and con- 
tinuous models. Some families of estimators based on approximate divergence 
criterions have also been considered; see Basu et al. [1J. 

Broniatowski and Keziou [6] have introduced a new minimum divergence es- 
timation method based on a dual representation of the divergence between 
probability measures. Their estimators are defined in an unified way for both 
continuous and discrete models. They do not require any prior smoothing and 
include the classical maximum likelihood estimators as a benchmark. A special 
case for the Kullback-Leibler divergence is presented in Broniatowski [I] . The 
present paper presents robustness studies for the classes of estimators gen- 
erated by the minimum dual 0-divergence method, as well as for some tests 
based on corresponding estimators of the divergence criterion. 

We give general results that allow to identify robust estimators in the class 
of dual 0-divergence estimators. We apply this study for the Cressie-Read di- 
vergences and state explicit robustness results for scale models and location 
models. Gain in robustness is often paid by some loss in efficiency This is 
discussed for some scale and location models. Our main remarks are as fol- 
lows. All the relevant information pertaining to the model and the true value 
of the parameter to be estimated should be used in order to define, when 
possible, robust and nearly efficient procedures. Some models allow for such 
procedures. The example provided by the scale normal model shows that the 
choice of a good estimation criterion is heavily dependent on the acceptable 
loss in efficiency in order to achieve a compromise with the robustness require- 
ment. When sampling under the model is overspread (typically for Cauchy and 
logistic models), non surprisingly the maximum likelihood estimator is both 
efficient and robust and therefore should be prefered (see subsection 13. 2p . 

On the other hand, these estimation results constitute the premises to con- 
struct some robust tests. The purpose of robust testing is twofold. First, the 
level of a test should be stable under small arbitrary departures from the 
null hypothesis (i.e. robustness of validity). Second, the test should have a 
good power under small arbitrary departures from specified alternatives (i.e. 
robustness of efficiency). To control the test stability against outliers in the 
aforementioned senses, we compute the asymptotic level of the test under a 
sequence of contaminated null distributions, as well as the asymptotic power 
of the test under a sequence of contaminated alternatives. These quantities are 
seen to be controlled by the influence function of the test statistic. In this way, 
the robustness of the test is a consequence of the robustness of the test statis- 
tic based on a dual 0-divergence estimator. In many cases, this requirement 
is met when the dual 0-divergence estimator itself is robust. 

The paper is organized as follows: in Section 2 we present the classes of es- 
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timators generated by the minimum dual 0-divergence method. In Section 3, 
for these estimators, we compute the influence functions and give the Fisher 
consistency. We particularize this study for the Cressie-Read divergences and 
state robustness results for scale models and location models. Section 4 is 
devoted to hypothesis testing. We give general convergence results for con- 
taminated observations and use it to compute the asymptotic level and the 
asymptotic power for the tests that we propose. In Section 5, the performances 
of the estimators and tests are illustrated by Monte Carlo simulation studies. 
In Section we shortly presents a proposal for the adaptive choice of tuning 
parameters. 



2 Minimum divergence estimators 



2.1 Minimum divergence estimators 



Let (p be a non-negative convex function defined from (0, oo) onto [0, oo] and 
satisfying ip(l) = 0. Also extend tp at defining <p(0) = lim ip(x). Let (X, B) be 

a measurable space and P be a probability measure (p.m.) defined on (X, £>). 
Following Ruschendorf [19], for any p.m. Q absolutely continuous (a.c.) w.r.t. 
P, the 0- divergence between Q and P is defined by 

Wv.fiU (1) 



x dP / 

When Q is not a.c. w.r.t. P, we set <f)(Q, P) = oo. We refer to Liese and Vajda 
[T6] for an overview on the origin of the concept of divergence in Statistics. 



A commonly used family of divergences is the so-called " power divergences" , 
introduced by Cressie and Read [9] and defined by the class of functions 

x G R* + - <p 7 (x) := X7 ~/ :r+ 1 7 ~ 1 (2) 

7(7-1) 

for 7 G R \ {0, 1} and <po(x) '■= ~ logx + x — 1, fi(x) := x logx — x + 1 with 
W(0) = lim ^(x), v? 7 (oo) = lim f^ix), for any 7 G R. The Kullback-Leibler 

divergence (KL) is associated with tf 1} the modified Kullback-Leibler (KL m ) 
to ipo, the x 2 divergence to ip 2 , the modified \ 2 divergence (Xm) t° <f-i and 
the Hellinger distance to (fi/2- 

Let {Pg : 9 G G} be some identifiable parametric model with O a subset 
of R d . Consider the problem of estimation of the unknown true value of the 
parameter 6q on the basis of an i.i.d. sample Xi, . . . , X n with p.m. Pg . 
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When all p.m. Pg share the same finite support S which is independent upon 
the parameter 9, the 0-divergence between Pg and Pg has the form 

In this case, Liese and Vajda [15], Lindsay [IT] and Morales et al. [18] investi- 
gated the so-called "minimum 0-divergence estimators" (minimum disparity 
estimators in Lindsay [17] ) of the parameter # defined by 

n ■= arg inf <f>(P , P n ), (3) 
where (p(Pej P n ) is the plug-in estimator of <f>(Po, Pq ) 

P n being the empirical measure associated to the sample. The interest on these 
estimators is motivated by the fact that a suitable choice of the divergence 
may leads to an estimator more robust than the maximum likelihood one (see 
also Jimenez and Shao [14]). For continuous models, the estimators in ([3]) are 
not defined. Basu and Lindsay [2] , among others, proposed smoothed versions 
of (J2D in this case. 

In the following, for notational clearness we write <f)(a, 9) for (j)(P a , Pg) for a 
and 9 in 9. We assume that for any 9 e 0, Pg has density pg with respect to 
some dominating cr-finite measure A. 

The divergence <p(a, 9 ) can be represented as resulting from an optimiza- 
tion procedure. This result has been obtained independently by Liese and 
Vajda [12] and Broniatowski and Keziou [5] who called it the dual form of a 
divergence, due to its connection with convex analysis. Assuming the strict 
convexity and the differentiability of the function if, it holds 

<p(t) >v(s) + <p'(s)(t-s) (4) 

with equality only for s = t. Let a and 9 be fixed and put t = p a {x) / pg (x) 
and s = p a {x)/pg(x) in (j3J) and then integrate with respect to Pg . This gives 

4>{a, 9 ) = / (f f — J dP do = sup / m(9, a)dPg (5) 
J \P0 O J eee J 

with m{9, a) :xh m(9, a, x) and 

rn(9, a, x) := [ V > ( P -A dP a - L (^(x)) ^(x) - cp (^(x)) ) . (6) 
J \PeJ { \Po J Pe \Pe J J 
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The supremum in (J5J) is unique and is attained in 9 = 9 , independently upon 
the value of a. Naturally, a class of estimators of 9 , called "dual 0-divergence 
estimators" (D0E's), is defined by 

9 n (a) := argsup / m(9, a)dP n , a £ 0. (7) 
eee J 

Formula ([7j) defines a family of M-estimators indexed by some instrumental 
value of the parameter a and by the function ip defining the divergence. The 
choice of a appears as a major feature in the estimation procedure. Its value 
is strongly dependent upon some a priori knowledge on the value of the pa- 
rameter to be estimated. In some examples in subsection I3.2[ it even appears 
that a sharp a priori knowledge on the order of 9 leads to nearly efficient and 
robust estimates. This plays in favor of using the available information per- 
taining to the model and the data. Section [6] shortly presents some proposal 
for the adaptive choice of a. 

For each a G 0, the divergence 4>(P a , Pe ) between P a and Pe is estimated by 

cj) n (a,9 ) := / m(9 n (a),a)dP n = sup / m(9,a)dP n . (8) 
J £ e J 



Further, since 

inf 0(Mo) = <K#oA) =0, 

and since the infimum in the above display is unique due to the strict con- 
vexity of ip, a natural definition of estimators of 9 , called "minimum dual 
0-divergence estimators" (MD^E's), is provided by 

a n := arg inf (f) n (a,9 ) = arg inf sup / m(9, a)dP n . (9) 

ae© ae© g e @ J 

The D0E's enjoy the same invariance property as the maximum likelihood 
estimator does. Invariance with respect to a reparametrization (one to one 
transformation of the parameter space) holds with direct substitution in ([7]). 
Also, consider a one to one differentiable transformation of the observations, 
say Y = T(X) and the Jacobian J(x) = -^-T(x). Let 9 n (a) defined in ([7]), 
based on the Xj's. Let fe(y) denote the density of the transformed variable Y 
and &n( a ) b e t ne D0E based on the Y^'s in the transformed model (with the 
same parameter 9). Specifically, 

%«,) = a rg sup { / „' (£(,)) Uv)d y -!£(„< (f «)) |«) - „ g ( y,)) ) } ^ 
Since 

fe(y) = P o(T-\y))\J(T-\y))\- 1 



5 



for all 9 G ©, it follows that = 9 n (a), which is to say that the D^E's 

are invariant estimators under any regular transformation of the observation 
space. The same invariance properties hold for MD^E's. 

Broniatowski and Keziou [5] have proved both the weak and the strong consis- 
tency, as well as the asymptotic normality for the estimators 6 n (a) and a n . In 
the next sections, we study robustness properties for these classes of estimators 
and robustness of some tests based on dual 0-divergence estimators. 

2.2 Some comments on robustness 

The special form of divergence based estimators to be studied in this paper 
leads us to handle robustness characteristics through the influence function 
approach. An alternative and appealing robustness analysis in the minimum 
divergence methods is provided by the Residual Adjustment Function (RAF) 
(introduced in Lindsay p2]), which explains the incidence of non typical Pear- 
son residuals, corresponding to over or sub-sampling, in the stability of the 
estimates. This method is quite natural for finitely supported models. In the 
case when the densities in the model are continuous, the Pearson residuals are 
estimated non parametrically which appears to cause quite a number of diffi- 
culties when adapted to minimum dual divergence estimation. This motivates 
the present choice in favor of the influence function approach. 

Let a be fixed. For the Cressie-Read divergences, the equation whose solution 
is 9 n (a) defined by (j7j) is 

_/(^y MA+ if : (^) , ^=o, do, 

J \PeJ n&\Po{Xi)J p e {Xi) 

where pe is the derivative with respect to 9 of pg. Starting from the definition 
given by ([7]), this equation is obtained by equalizing with zero the derivative 
with respect to 9 of / m(9, a)dP n . 

Let x be some outlier. The role of x in (jTOl) is handled in the term 

Pa(x) V Pe(x) 
Pe(x) J pe(x)' 

The more stable this term, the more robust the estimate. In the classical 
case of the maximum likelihood estimator (which corresponds to 9 n (a) with 
7 = and independent on a), this term writes as which is the likelihood 
score function associated to x. It is well known that, for most models, this 
term is usually unbounded when x belongs to M, saying that the maximum 
likelihood estimator is not robust. In this respect, (TTTi) appears as a weighted 
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likelihood score function. In our approach, for several models, such as the 
normal scale, (TTTT) is a bounded function of x, although itself is not. 
Thus, in estimating equation ([TO]) , the score function is downweighted for 
large observations. The robustness of 8 n (a) comes as a downweight effect of 
the quantity j£r%\ through the multiplicative term which depends 

on the choice of the divergence. This choice is dictated by the form of Pa< f\ 
for large x and a fixed. For the models we'll consider as examples, for large 
x and a fixed, the quantity can be large, close to zero, or close to one. 
Then we appropriately choose 7 to be negative, respectively positive in order 
to obtain the downweight effect. In the next section we study in detail these 
robustness properties by the means of the influence function. 

Some alternative choice has been proposed in literature. Basu et al. [T] pro- 
posed to alter the likelihood score factor by the multiplicative term Pq{x), 
where /3 > 0. This induces an estimating procedure which is connected to 
the minimization of a density power divergence. Both their approach and the 
present one are adaptive in the sense that the downweight likelihood score 
factor is calibrated on the data. 

Robustness as handled in the present paper is against the bias due to the 
presence of very few outliers in the data set. Bias due to misspecification of 
the model is not considered. It has been observed that D0E's are biased un- 
der misspecification even in simple situations (for example when estimating 
the mean in a normal model with assumed variance 1, whereas the true vari- 
ance is not 1); see Broniatowski and Vajda [8]; similar bias are unavoidable in 
parametric inference and can only be reduced through adaptive specific pro- 
cedures, not studied here. For alternative robust M-estimation methods using 
divergences we refer to Toma |24j. 



3 Robustness of the estimators 



3. 1 Fisher consistency and influence functions 



In order to measure the robustness of an estimator it is common to compute 
the influence function of the corresponding functional. 

A map T which sends an arbitrary probability measure into the parameter 
space is a statistical functional corresponding to an estimator T n of the pa- 
rameter 9 whenever T(P n ) = T n . 

This functional is called Fisher consistent for the parametric model {Pg : 9 G 
0} if T(P e ) = 9, for all 9 G 6. 
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The influence function of the functional T in P measures the effect on T of 
adding a small mass at x and is defined as 

IFfx; T, P) = lim T ^ £x) - T(P) (12) 

e^O £ 

where P £X = (1 — e)P + e&E and 5 X is the Dirac measure putting all its mass 
at x. 

The gross error sensitivity measures approximately the maximum contribution 
to the estimation error that can be produced by a single outlier and is defined 

as 

sup||IF(x;T,P)||. 

X 

Whenever the gross error sensitivity is finite, the estimator associated with 
the functional T is called B-robust. 

Let X\, . . . , X n be an i.i.d. sample with p.m. P. 

Let a be fixed and consider the dual 0-divergence estimators 9 n (a) defined in 
([7]). The functional associated to an estimator 9 n (a) is 

T a (P) := argsup / m(9,a,y)dP(y). (13) 
<?ee J 



The functional T a is Fisher consistent. Indeed, the function 9 i— ► / m(9, a)dPo 
has a unique maximizer 9 = 9 . Therefore T a (Pg) = 9, for all 9g6. 

We denote m!(9, a) = -^m(9, a) the d-dimensional column vector with entries 
^-m(9, a) and m"(9, a) the d x d matrix with entries g ® 2 ge m(9, a). 

In the rest of the paper, for each a, we suppose that the function 9 i— > m(#, a) 
is twice continuously differentiate and that the matrix / m"(9o, a)dPg exists 
and is invertible. We also suppose that, for each a, all the partial derivatives of 
order 1 and 2 of the function 9 i— > m(9, a) are respectively dominated on some 
neighborhoods of 9q by Pe -integrable functions. This justifies the subsequent 
interchanges of derivation with respect to 9 and integration. 

Proposition 1 The influence function of the functional T a corresponding to 
an estimator 9 n {a) is given by 



l¥{x-T a ,Pe Q ) 



m"(9 ,a)dPe 



-i 



J \ Vf) n / pi 



o/ ru 



II I Pa 



PeS x ) 



Pe {x) 
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Particularizing a = 9q in Proposition [T] yields 

W(x;T 6o ,P eo ) = - U m"(9 ,9 )dP eo 
and taking into account that 

- J m"(9 ,9 )dP go 

it holds 



-i 



</(!) 



Pe {x) 



1 



^"(1) 



IF(*;T, ,P e J=^^ (14) 
where Jg is the information matrix Ig = J 1 -^^ L d\. 

We now look at the corresponding estimators of the 0-divergence. For fixed 
a, the divergence 4>(P a ,P) between the probability measures P a and P is 
estimated by (jSJ). The statistical functional associated to 4> n (P a ,Pg ) is 

UJP) := J m(T a (P),a,y)dP(y). (15) 



The functional U a has the property that U a (Pg) = <p(a,9), for any 9 G 0. 
Indeed, using the fact that T a is a Fisher consistent functional, 



U a (Po) = J rn (T a (P e ) , a, y)dP e (y) = J m(0, a, y)dP e (y) = 6 1 ) 
for all 9 £ O. 

Proposition 2 T/ie influence function of the functional U a corresponding to 
the estimator <p n (P a , P) is given by 

IF(x; U a , P Bo ) = -(f){a, 9 ) + m(9 , a, x). (16) 



For a minimum dual 0-divergence estimator a n defined in ([9]), the correspond- 
ing functional is 

V(P) := arg inf U a {P) = arg inf / m(T a (P), a, y)dP(y). (17) 

aG0 cr£0 J 



The statistical functional V is Fisher consistent. Indeed, 



V{P e ) = arg inf UJPg) = arg inf 0(a, 0) = 

aS0 cr£0 



for all 9eQ. 
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In the following proposition, we suppose that the function m{6, a) admits 
partial derivatives of order 1 and 2 with respect to 9 and a and also we suppose 
that conditions permitting to derivate m(9, a) under the integral sign hold. 
The following result states that, unlike 9 n (a), an estimator a n is generally 
not robust. Indeed, it has the same robustness properties as the maximum 
likelihood estimator, since it has its influence function which in most cases 
is unbounded. Whatever the divergence, the estimators a n have the same 
influence function. 

Proposition 3 The influence function of the functional V corresponding to 
an estimator ct n is given by 



lF(x;V,P 6o ) 



l 9o 



(X 



(18) 



3.2 Robustness of the estimators for scale models and location models 



In this subsection, examining the expressions of the influence functions, we 
give conditions for attaining the B-robustness of the dual 0-divergence es- 
timators 9 n (a), as well as of the corresponding divergence estimators. The 
case of interest in our B-robustness study is a ^ 9 Q since, as observed above, 
the choice a = 9q generally leads to unbounded influence functions. For the 
Cressie-Read family of divergences ([2]) it holds 



IF(x; T a ,P ( 



9 j 



m 



), a)dP t 



0o J 



and 




dX- 




IF(x; U a , P 0O ) = -(f>(a, 9 ) + m(9 , a, x) 

1 I / / Pc 



-6(a,0 o ) + 



7 




7-1 



dP n -l> - 



p a (x) 



7 I \Pe (x) : 



3.2.1 Scale models 



For a given density p, it holds pe(x) 
Consider the following conditions: 



V(f ) andp e (:r) = [p (f ) + %p ( 



(A.l) / \up{u)\du < oo. 
( A -2) ^ ^ < - 
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(A. 3) sup 



x p( a - 1 x) 



< OO. 



(A.4) sup x 



p(a x) 



< OO. 



Proposition 4 For scale models, if the conditions (A. 2) {for the case 7 > 0) 
or (A. 3) {for the case 7 < 0) together with (A.l) and (A.4) are satisfied, then 
6 n {a) is B-robust. 

As a particular case, consider the problem of robust estimation of the pa- 
rameter 6q = o of the univariate normal model, when the mean m is known, 
intending to use an estimator 9 n {a) with a 7^ a. We are interested on those 
divergences from the Cressie-Read family and those possible values of a for 
which 9 n {a) is B-robust. We have 



W(x;T w ,P a ) = [Jm"{a,a)dP a }- i y (^Y ^dP a 





X 



Fig. 1. Influence functions IF(x; P a ) for normal scale model, when m = 0, the 
true scale parameter is a = 1 and W = 1.9. 

It is easily seen that IF{x; T w , P a ) is bounded whenever the function f 2sM"j 7 EsM 
is bounded. Since 



'Pct(x)\ 7 Po-(x) o" 7 1 [fx — m\ 2 , 1 / ( 1 [fx — m\ 2 f x — m\ 2 

1 exp -2U— J 



VAX) J Pa{x) a' \\ a 



(20) 

boundedness of IF(x; TV, P CT ) holds when 7 > and <r < a or when 7 < and 
a > a, cases in which the conditions of Proposition @] are satisfied. A simple 
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calculation shows that these choices of 7 and a assure that J m"(a,a)dP cr is 
finite and non zero. However, when using the modified Kullback-Leibler diver- 
gence (7=0), none of the estimators 9 n (a) is B-robust, the function (TSUI) being 
unbounded. These aspects can also be observed in Figure [TJ which presents 
influence functions for different divergences when o~ = 1 and a = 1.9. The 
negative values of the influence function in a neighborhood of is explained 
by the decrease of the variance estimate when oversampling close to the mean. 

The asymptotic relative efficiency of an estimator is the ratio of the asymptotic 
variance of the maximum likelihood estimator to that of the estimator in 
question. For the scale normal model, the choice of a close to o assures a 
good efficiency of 9 n (a) and also the B-robustness property. Then, the bigger 
is the value of I7I, the smaller is the gross error sensitivity of the estimator. 
For example, for o = 1 and a = 0.99, the efficiency of 9 n (a) is 0.9803 when 
7 = 0.5, 0.9615 when 7 = 1, 0.9266 when 7 = 2 and 0.8947 when 7 = 3, 
the most B-robust estimator corresponding to 7 = 3. As can be inferred from 
Figured], the curves IF 2 (x; T w , P a ) are ordered decreasingly with respect to 
|7|. Therefore, large values of I7I lead to small gross error sensitivities and low 
efficiencies, since the asymptotic variance of 9 n (a) is [/ IF 2 (x; T w , P a )dP a \~' 1 
(see also Hampel et al. [11] for this formula). 

For scale models, conditions of Proposition H] assure that 9 n (a) and the corre- 
sponding divergence estimator 4> n (ai, 9q) are B-robust. 



3.2.2 Location models 



It holds p$(x) = p(x — 9). 

Proposition 5 For location models, if the condition 

p(x — a) \ 7 d 



SU P 1 , 4 , 

^p(x - 0q) i 

is satisfied, then 9 n (a) is B-robust. 



09 



\ogp(x - 9 ) 



< 00 



(21) 



For the Cauchy density the maximum likelihood estimator exists, it is consis- 
tent, efficient and B-robust and all the estimators 9 n (a) exist and are B-robust. 
Indeed, condition f[2"TT) writes 



sup 2 



l + (x-9 ) 2 V 



x - 9 n 



1 + (x 



a) 



l + (x-9 



< 00 



and is fulfilled for any 7 and any a. Also, the integral / m"(9o, a)dPg exists 
and is different to zero for any 7 and any a. This is quite natural since sam- 
pling of the Cauchy law makes equivalent outliers and large sample points due 
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— gama = -1 
gama = 

gama = 1 

— gama = 2 

— - gama = 3 



-10 -5 5 



Fig. 2. Influence functions IF(x; T a , Pq ) for the Cauchy location model, when the 
true location parameter is 9$ = 0.5 and a = 0.8. 

to heavy tails. However it is known that the likelihood equation for Cauchy 
distribution has multiple roots. The number of solutions behaves asymptoti- 
cally as two times a Poisson(l/7r) variable plus 1 (see van der Vaart [23] p. 
74). The possible selection rule for the estimate is to check the nearly com- 
mon estimates for different a and (^-divergences. Figure [2] presents influence 
functions IF(x; T a , Pg ), when 7 G { — 1, 0, 1, 2, 3, }, 9 = 0.5 and a = 0.8. For 
these choices of 6$ and a, the efficiency of 8 n (a) is 0.9775 when 7 = 1, 0.9208 
when 7 = 2, 0.8508 when 7 = 3. Here, when 7 increases, the decrease of the 
efficiency is worsened by a loss in B-robustness. In this respect, the maxi- 
mum likelihood estimator appears as a good choice in terms of robustness and 
efficiency. 

In the case of the logistic location model, a simple calculation shows that the 
condition f[2"Tj) is fulfilled for any 7 and any a. Also, the integral J m"(9 , a)dPg 
exists and is different from zero for any 7 and any a. These conditions entail 
the fact that all the estimators 9 n (a) are B-robust. Figure 0] presents influence 
functions lF(x; T a , Pe ), when 7 e { — 1,0,0.5,1,2,3,}, 9q = 1 and a = 1.5. 
As in the case of the Cauchy model, when 7 increases, the decrease of the 
efficiency is worsened by the increase of the gross error sensitivity, such that 
the maximum likelihood estimator appears again as a good choice in terms of 
robustness and efficiency. 

On the other hand, for the mean of the normal law, none of the estimators 
9 n (a) is B-robust, their influence functions being always unbounded. 
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gama = -1 
gama = 2 
gama = 3 



^ i i i r 

-10 -5 5 10 

x 



Fig. 3. Influence functions IF(x; U a , Pg ) for the Cauchy location model, when the 
true location parameter is 9q = 0.5 and a = 0.8. 




Fig. 4. Influence functions IF(x; T a , Pg ) for the logistic location model, when the 
true location parameter is 9q = 1 and a = 1.5. 



In the case of the Cauchy model, as well as in the case of the logistic model, 
lF(x;U a , Pg ) is bounded for any 7 and any a. In Figure [3l respectively in 
Figure O we present such influence functions for different choices of 7. Thus, 
for these two location models, all the estimators (f> n (a,9o) are B-robust. 
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Fig. 5. Influence functions IF(x; U a , Pg ) for the logistic location model, when the 
true location parameter is 6>o = 1 and a = 1.5. 

4 Robust tests based on divergence estimators 

4-1 Asymptotic results for contaminated observations 

This subsection presents some asymptotic results that are necessary in or- 
der to analyze the robustness of some tests based on divergence estimators. 
These asymptotic results are obtained for contaminated observations, namely 
Xi, . . . , X n are i.i.d. with 



where 9 n = 9 + A being an arbitrary vector from M d . 
For a fixed consider the following conditions: 

(C.l) The function 6 \— > m(9, a) is C 3 for all x and all partial derivatives 
of order 3 of 9 i— > m(#, a) are dominated by some Pe^-integrable function 
x i— > with the property / H 2 dPg n is finite, for any n and any A. 

(C.2) / m(9 ,a)dPg n and / m 2 (9 ,a)dP en are finite, for any n and any A. 

(C.3) / m'(9o,a)dPg n and / m f (9 ,a)m'(9 ,aYdPg n exist, for any n and any 
A. 




(22) 



15 



(C.4) / m"(9o,a)dPe n and J m"(9o,a) 2 dPg n exist, for any n and any A. 

The estimators 9 n (a) have good properties with respect to contamination in 
terms of consistency. 

Proposition 6 If the conditions (C.l), (C.3) and (C.4) are satisfied, then 

V^{9 n { a )-T a {P^ x )) = P {l). 

Also, (p n (a,9o) enjoys normal convergence under (1221 . 

Proposition 7 //a/fl an d the conditions (C.l) — (C.4) are satisfied, then 

UiF\ y] u a ,p^ x )dP n ^M] 1/2 

converges in distribution to a normal standard variable. 
4-2 Robust tests based on divergence estimators 



In this subsection we propose tests based on dual 0-divergence estimators and 
study their robustness properties. We mention that the use of the dual form 
of a divergence to derive robust tests was discussed in a different context by 
Broniatowski and Leorato [7] in the case of the Neyman \ 2 divergence. 

For testing the hypothesis 9 = 9q against the alternative 9 ^ 9q, consider the 
test of level «o defined by the test statistic n := 4> n (a,9 ) with a/fl and 
by the critical region 



C :-- 



y/n($ n - <j)(a,9 )) 



[JW 2 (y;U a ,P eo )dPe (y)} 1/2 
where <?i_^a is the (1 — ^)-quantile of the standard normal distribution. 

Due to the asymptotic normality of (f) n , for n large, the level writes as 



y/n{4> n - <fi(a,9 )) 



(23) 



UlF\y;U a ,Pe )dPe (y)] 1/2 
= P eo {\$ n -<t>(a,9 )\ > {^y^j lF\y-U m P eo )dP eo {y)\ y \-^) (24) 
= 2P 6o (k> fenW) (25) 
where k n (a ) = (^'V ^ 2 (y; U a , Pe^dPe^y)} 1 ' + <K«A). 
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We work with the form (I25I) of the level and consequently of the probability 
to reject the null hypothesis, this being easier to handle in the proofs of the 
results that follows. 

Consider the sequence of contiguous alternatives 6 n = 9 + An -1 / 2 , where A 
is any vector from M. d . When 9 n tends to 6q, the contamination must converge 
to at the same rate, to avoid the overlapping between the neighborhood 
of the hypothesis and that of the alternative (see Hampel et al. [IT] , p. 198 
and Heritier and Ronchetti p2]). Therefore we consider the contaminated 
distributions 

p ^-( 1 -^) p " + js 5 ' (26) 

for the level and 

<-= i 1 - js) p »- + ^' (27) 

for the power. 

The asymptotic level (the asymptotic power) under (1261) (under (|27|) ) will be 
evaluated now. 

Let Po = lim n ^oo 2Pg n (0 n > k n (ao)) be the asymptotic power of the test 
under the family of alternatives Pg n . The test is robust with respect to the 
power if the limit of the powers under the contaminated alternatives stays 
in a bounded neighborhood of /3q, so that the role of the contamination is 
somehow controlled. Also, the test is robust with respect to the level if the 
limit of the level under the contaminated null distributions stays in a bounded 
neighborhood of ctQ. 

Let P ny£)X = 2P^ ex ((p n > k n (a )). In the same vein as in Dell'Aquilla and 
Ronchetti [10] it holds: 

Proposition 8 // the conditions (C.l) — (C.4) are fulfilled, then the asymp- 
totic power of the test under P^ £ x is given by 



lim P„ F T = 2 - 2$ ( f 1 - — ) - A 



n->oo n,£ ' x i » 1 ' r f td2 



[fW 2 (y;U a ,P eo )dP9o(y)} 1/2 
[f IF 2 iy;U a ,P eo )dP eo (y)W*, 



- £r . J^ U - P ^ , (28) 



where c = f m(9 , a, y) ^j^ dPg (y) and $ is the cumulative distribution func- 
tion of the standard normal. 

A Taylor expansion with respect to e yields 
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lim i> = 2 - 2$ ( f 1 - ^ ) - A r „_ 0/ - r ; - - ) + 



n,e,x 

n—>oo 



2 J [JIF*(y;U a ,Pe )dPe (y)} 1/2 
-i / , "o \ v r \ IF ( x ; U a,Pe Q ) 



a \ 




2 y 










2/ 



[/ IF%; C/ a , P eo )dP 0o (y)]i/2; [/ t F % ; C/ a , P eo )dP 0o (y)]i/2 
A- 



c \ W(x;U a ,P ei 



' [J lF 2 (y; U a , Pe )dPg (y)} 1/2 J U * F \v\ U a , Pe o )dP 0o (y)} 



+o (e) 



where (3$ is the asymptotic power for the non contaminated model and / is 
the density of the standard normal distribution. 

In order to limit the bias in the power of the test it is sufficient to bound the 
influence function IF (a;; U a , Pe ). Bounding the influence function is therefore 
enough to maintain the power in a pre-specified band around /3q. 

Let L n>e;X = 2P^ ex Q> n >k n {a )). Putting A = in (jUJ yields: 

Proposition 9 If the conditions (C.l) — (C.4) are fulfilled, then the asymp- 
totic level of the test under P^ £ x is given by 



lim L n ^ x = 2 - 2$ $ 

n— >oo \ 



-1 



1 _ «0 

2 

a 



W{x-U a ,P 6o ) 



a + ef rMl-^ 



UlF 2 (y;U a ,P eo )dP eo (y)} 1/2 , 
lF(x;U a ,P 9o ) 



2JJ [JlF 2 (y;U a ,Pe )dPe (v)} 1/2 



o(e). 



Hence, when lF(x; U a , Pg ) is bounded, L nfi ^ x remains between pre-specified 
bounds of olq. 

As the Proposition [8] and Proposition [9] show, both the asymptotic power 
of the test under P^ e x and the asymptotic level of the test under P% £ x are 
controlled by the influence function of the test statistic. Hence, the robustness 
of the test statistic <f) n , as discussed in the previous section, assures the stability 
of the test under small arbitrary departures from the null hypothesis, as well 
as a good power under small arbitrary departures from specified alternatives. 
Figures [3] and [5] provide some specific values of 7 and a inducing robust tests 
for #0 corresponding to those models. 



5 Simulation results 



Simulation were run in order to examine empirically the performances of the 
robust dual 0-divergence estimators and tests. The considered parametric 
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model was the scale normal model with known mean. We worked with data 
generated from the model, as well as with contaminated data. 

To make some comparisons, beside dual 0-divergence estimators, we consid- 
ered minimum density power divergence estimators of Basu et al. [1J (MD- 
PDE's) and the maximum likelihood estimator (MLE). Recall that a MDPDE 
of a parameter 9 is obtained as solution of the equation 

r 1 n 

with respect to 9, where (3 > and X±, . . . ,X n is a sample from Pg. In the 
case of the scale normal model J\f(m, a), equation ff29l writes as 




and the MDPDE of the parameter a is robust for any (3 > 0. 

In a first Monte Carlo experiment the data were generated from the scale 
normal model J\f(0, 1) with mean m = known, a = 1 being the parameter of 
interest . We considered different choices for the tuning parameter a and for the 
Cressie-Read divergence to compute D0E's, and different choices for the tuning 
parameter (3 in order to compute MDPDE's. For each set of configurations 
considered, 5000 samples of size n = 100 were generated from the model, and 
for each sample D0E's, MDPDE's and MLE were obtained. 

In Table 1 we present the results of the simulations, showing simulation based 
estimates of the bias and MSE given by 

i n s i ris 

Bias = -X>; -v), MSE = - - a) 2 , 
n s i=1 n s i=l 

where n s denotes the number of samples (5000 in our case) and di denotes an 
estimate of a for the zth sample. Examination of the table shows that D0E's 
give as good results as MDPDE's or MLE. 

In a second Monte Carlo experiment, we first generated samples with 100 ob- 
servations, namely 98 coming from A/"(0, 1) and 2 outliers x — 10 and then 
we generated samples with 100 observations, namely 96 from A/"(0, 1) and 4 
outliers x = 10. The tuning parameters were the same as in the non contami- 
nated case and also n s = 5000. The simulation results are given in Table 2. As 
can be seen, the results for D0E's and MDPDE's are comparable, they being 
better than the results for MLE in both cases. 
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A close look at the results of the simulations show the D0E performs well 
under the model, when no outliers are generated; indeed the best results are 
obtained when 7 = —0.1, whatever a = 1.5 or W — 1.9. The performance 
of the estimator under the model is comparable to that of some MDPDE's 
in terms of empirical MSE (MSE): indeed the MSE for D0E with 7 = —0.1 
parallels MDPDE's for small (3. It is also slightly shorter than the one obtained 
through the MLE. Under contamination, the D0E with 7 = —0.5 yields clearly 
the most robust estimate and the empirical MSE is very small, indicating a 
strong stability of the estimate. It compares favorably with MDPE for all 
j3, whatever a = 1.5 or a = 1.9. The simulation with 4 outliers at x = 10 
provide a clear evidence of the properties of the D0E with 7 = —0.5. Also 
small values of (3 give similar results as large negative values of 7, whatever a, 
under contamination. Although 7 = —0.1 is a good alternative to MLE under 
the model, 7 = —0.5 behaves quite well in terms of bias while keeping short 
empirical MSE under the model or under contamination. These results are in 
full accordance with Figure 1; indeed the influence function is constant close 
to for large values of x. 

Thus, the D0E is shown to be an attractive alternative to both the MLE and 
MDPDE in these settings. 

In order to test the hypothesis a = 1 with respect to the alternative a / 1, 
we considered the test statistic 



(here 9 = a = 1). Under the null hypothesis, this test statistic is asymptoti- 
cally Af(0, 1). We worked with data generated from the model Af(0, 1), as well 
as with contaminated data. In each case, we simulated 5000 samples and we 
computed the actual levels 



corresponding to the nominal levels a = 0.01, 0.02, . . . , 0.1. We reported the 
corresponding relative errors 



In Figure M we present relative errors for the robust tests applied to the scale 
normal model Af(0, 1), when the data are generated from the model. The 
sample size is n = 100, the tuning parameter is a = 1.9 and the Cressie-Read 
divergences correspond to 7 G { — 1.5,-1,-0.5,-0.1}. The approximation of 
the level is good for all the considered divergences. 



Vn(4> n - 4>(a,9 )) 



[JlF 2 (y;U a ,P eo )dPe (y)} 1/2 



P 
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In Figure [7] are represented relative errors of the robust tests applied to the 
scale normal model AA(0, 1), for samples with n = 100 data, namely 98 data 
generated from 7V(0, 1) and 2 outliers x = 10. We considered a = 1.9 and 
7 G {—2,-1.5}. Again, the approximation of the level of the test is good for 
all the considered divergences. 

In Figure [8] we present relative errors of the robust tests applied to the scale 
normal model A/"(0, 1), for samples with n = 100 data, namely 96 data gen- 
erated from A/"(0, 1) and 4 outliers x = 10. We considered a = 1.9 and 
7 E {-2,-1.5}. 

Observe that the tests give good results for values of 7 close to zero when the 
data are not contaminated, respectively for large negative values of 7 when 
the data are contaminated. 

Thus, the numerical results show that dual 0-divergence estimates and corre- 
sponding tests are stable in the presence of some outliers in the sample. 
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Table 1. 



Simulation results for D0E, MDPDE and MLE of 
the parameter a = 1 when the data are generated 
from the model A/"(0, 1). 

a Bias MSE 



D<£E 








a=1.5 7 = -2 


0.99770 


-0.00229 


0.00917 


(7=1.5 7 = —1.5 


0.99735 


-0.00264 


0.00822 


<7=1.5 7 = — 1 


0.99760 


-0.00239 


0.00698 


(7=1.5 7 = —0.5 


0.99833 


-0.00166 


0.00563 


a=1.5 7 = -0.1 


0.99799 


-0.00200 


0.00492 


o=1.9 7 = -2 


0.99892 


-0.00107 


0.01029 


ct=1.9 7 = -1.5 


0.99841 


-0.00158 


0.00924 


0=1.9 7 = -1 


0.99824 


-0.00175 


0.00773 


a=1.9 7 = -0.5 


0.99839 


-0.00160 


0.00588 


O l.iJ J U.l 


u.yy / uo 


-U.UUiOl 




MDPDE 








(3 = 0.1 


0.99894 


-0.00105 


0.00514 


(3 = 0.5 


0.99986 


-0.00013 


0.00686 


(3=1 


1.00005 


0.00005 


0.00927 


(3= 1.5 


1.00074 


0.00074 


0.01077 


= 2 


1.00150 


0.00150 


0.01165 


= 2.5 


1.00294 


0.00294 


0.01266 


MLE 


0.99743 


-0.00256 


0.00501 
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Table 2. 



Simulation results for D0E, MDPDE and MLE of the parameter a = 1 when 98 data 
are generated from the model M(0, 1) and 2 outliers x = 10 are added, respectively 
when 96 data are generated from the model M(0, 1) and 4 outliers x = 10 are added. 

2 outliers 4 outliers 

a Bias MSE 5 Bias MSE 



D^E 
















a=1.5 7 = -2 


1 


01186 


0.01186 


0.00914 


1.02540 


0.02540 


0.00946 


(7=1.5 7 = —1.5 


1 


00850 


0.00850 


0.00816 


1.01911 


0.01911 


0.00833 


(7=1.5 7 = — 1 


1 


00499 


0.00499 


0.00697 


1.01210 


0.01210 


0.00707 


(7=1.5 7 = —0.5 


1 


00171 


0.00171 


0.00572 


1.00526 


0.00526 


0.00583 


a=1.5 7 = -0.1 


1 


09661 


0.09661 


0.01641 


0.99766 


-0.00233 


0.00088 


o=1.9 7 = -2 


1 


01589 


0.01589 


0.01059 


1.03547 


0.03547 


0.01182 


ct=1.9 7 = -1.5 


1 


01236 


0.01236 


0.00942 


1.02840 


0.02840 


0.01027 


o=1.9 7 = -1 


1 


00785 


0.00785 


0.00785 


1.01912 


0.01912 


0.00838 


TP — 1 n — pi r 
U — l.iJ J — — u.o 


i 
i 




n nn97zL 


u.uujyo 






U.UUUO ( 


a=1.9 7 = -0.1 


1 


06708 


0.06708 


0.02241 


1.10531 


0.10531 


0.02083 


MDPDE 
















(3 = 0.1 


1 


01117 


0.01117 


0.00646 


1.02676 


0.02676 


0.00891 


(3 = 0.5 


1 


00700 


0.00700 


0.00712 


1.01417 


0.01417 


0.00743 


(3=1 


1 


01406 


0.01406 


0.00975 


1.02892 


0.02892 


0.01062 


(3= 1.5 


1 


01916 


0.01916 


0.01148 


1.03876 


0.03876 


0.01297 


(3 = 2 


1 


02233 


0.02233 


0.01254 


1.04448 


0.04447 


0.01450 


(3 = 2.5 


1 


02450 


0.02450 


0.01342 


1.04771 


0.04771 


0.01556 


MLE 


1 


72587 


0.72587 


0.52852 


2.22720 


1.22720 


1.50701 
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0.04 0.06 
nominal level 



nominal level 




Fig. 6. Relative errors of the robust tests applied to the scale normal model Af(0, 1) 
when a = 1.9 and 100 data are generated from model. 




Fig. 7. Relative errors of the robust tests applied to the scale normal model A^(0, 1) 
when cf = 1.9, 98 data are generated from model and 2 outliers x=10 are added. 
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Fig. 8. Relative errors of the robust tests applied to the scale normal model M(0, 1), 
when a = 1.9, 96 data are generated from model and 4 outliers x=10 are added. 

6 An adaptive choice of the tuning parameter 

At the present stage we only present some heuristic and defer the formal 
treatment of this proposal, which lays beyond the scope of the present work. 

According to the model and the parameter to be estimated, the choice of 

7 should be considered with respect to the expression ffTTl) which has to be 
bounded. We refer to the examples given in subsection 13.21 for some scale and 
location model. 

Given a set of observations X\, . . . ,X n an adaptive choice for a would aim 
at reducing the estimated maximal bias caused by an extraneous data. De- 
fine 8 n (a, 7) the D</>E of 8q on the entire set of observation. For 1 < i < n, let 
0£,_i(at, 7) be the D0E of 9 built on the leave one out data set X\, X%, . . . , AVi, 
X i+1 , ...,X n . Define 

B n (a,j) := max \9 n (a,j) - ^-i( a )7)l 

which measures the maximal bias caused by a single outlier and 

a* (7) := arginf B n (a,j). 



7 Proofs 

Proof of Proposition U\ 

For fixed a, 6 n (a) are M-estimators. In accordance with the theory regarding 
the M-estimators (see for example van der Vaart [25]), the so called ^-function 
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corresponding to 9 n (a) is 

ip a (x, 9) = m'(9, a, x) 
and the influence function of T a is 

IF(x; T Q ,P 6o ) = [M^Pe,))' 1 Tp a (x,T a (P 6o )) (30) 

where 

M(iP a ,P 6o ) = - j — ty a (y,6)} eo dP eo (y) = - j m"(9 ,a,y)dP 6o (y). 
Using the Fisher consistency of the functional T a , 

tp a (x,T a (Pg ))=ip a (x,9 ) 



, „ (Pa\ Pa . , D . // (Pa , \ Pl(x) . , s 



which substituted in ( |30l) leads to the announced result. □ 
Proof of Proposition 

Let e > and Pe 0£X = (1 — £)Pe + £ &x be the contaminated model. Then 

u <* { p e 0£ x) =Jm(T a (P d0£X ) , a, y) dP 6o£X (y) 

= (1 - e) J m (T a (Pe 0EX ) , a, y) dP 0Q (y) + em (t q (i\ £X ) , a, x 

and derivation yields 



IF(z; U a , P 6o ) = ^ [t/ Q (Pg 



ex 



£ = 



= - J m(6 , a, y)dPo {y) + IF(x; T a , Pe Y J m'(9 , a, y)dP 9o (y) + m(9 , a, x) 
= —(j)(a, ) + m(0 , a, x). 

□ 

Proof of Proposition [3] 

For notational clearness, define T : x M. — > G, 

T(a,P) :=T Q (P). 
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For each a G 0, the definition of T(a, P) leads to 

jw! (T(at,P),a,y)dP(y) = 0. 



By the very definition of V(P) and T(V(P), P), they both obey 

/ mf (T(V(P), P),V{P), y) dP(y) = 
f£[m{T(a,P),a,y)} v(p) dP(y) = 0' 

Denoting n(6, a, y) = ^m(9, a, y) 

n(9, a ,y)=f <p" ^dP a +[</ (?°) ^dP a - 

J \Pe J Pe J \Pe J Pa 

- Ip" fay)) v ^v)Mv) + ✓ fay)) ^ - J fay)) ^} 

{ \Pe J Pe \Po J Pe{y) \Pe J Pe{y) J 

= / W {-) ~ + ^ f-) -) p* dP « - (-(y)) P -^(y)p*(y)- 

J { \PeJ Pe \Pe ) Pa J \Pe J Pe 

From d3U 
/ m! (T(V(P), P), V(P),y) dP(y) = 

£ [T(a, P)} v{P} fm> (T(V(P), P), V{P),y) dP{y) +Jn (T(V(P), P), V(P),y)dP(y) = 
and consequently 

/ m! {T(V(P), P), V(P), y) dP(y) = 
Jn{T{V{P),P),V(P),y)dP(y) = ' 

For the contaminated model 

Jn (T (V (Pe 0£X ) , p7 0£X ) , v (p7 0£X ) , y) dp7 0£X (y) = o 

and so 

(1 - e) jn (T (V (P7 0£X ) , P7 0£X ) , V (P7 0£X ) , y) dP do (y)+ 
+en (T (V (P7 0£X ) , P7 0£X ) , V (l\ )£X ) ,x)=0. 
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Now derivation yields 



- / n(6 , 9o, y)dP eo {y) + / §- e n (T(9 , P 9a ),9 Q , y) dP eo (y) {£T(9 , P do )W(x; V, P 6o )+ 

+lF(x; T 6o , P eo )} + / £n(9 , 9 , y)dP eo {y)IF(x; V, P 6o ) + n{8 , 8 , x) = 0. 

(32) 

Since 



>-'=/{-vfe)l-'"fe)^}^- 

and particularly 
deduce that 

J ^n(9 ,9 ,y)dPe o (y) = 0. (33) 

On the other hand 



9 (O \ [\ n,lPa\Pa i I Pa \ Pa ri I Pa\ Pa \ t j r> 

J I VPs J Pe \Pe Pa) \ Pa 



'>>) mf° WM 

n (Pa ( \ P a {y)Pa{y) t n(Pa, \ Pa{y) .. , v 



and particularly 



9 

-n 



(*o,flo,v) = / + 2^'(l)}^#^o + / v"(l)^dP eQ 

J Va., J Vf)n 



da J p 2 6o J pe 

-w"\i) + ^i)} peo{ f;\ {y)t - ^(i)^44. 

Pe (y) Pe {y) 
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As a consequence 



d_ 

da 



Also 



n(e ,9 ,y)dPe (y) = <p"(l) f P -^dP 6o = <p"{l)I 6o . (34) 

J Vo.. 



n(6 ,e ,x) = - V "(l)^Q. (35) 



Using the Fisher consistency of the functional T a and substituting (|33|) . (1341) 
and (J3SD in (J32D it holds 



lF(x;V 1 P 8o )=I^^f\ (30) 



and this completes the proof. □ 
Proof of Proposition [^] 
By replacing p e , 



and similarly 



^ 2 I 


p(a 1 x) 7 


a"? | 

^ 2 ! 


p(a~ 1 x) 1 


of | 






j ( p(oi~ l x) 


of? 


I Wo 1 *) 



PaC^) \ 7 P0oO c ) 8q 1 j { Pi a 1;r ) N \ 7 x /'p( a \ 7 p(^o lx ) 



,Pe (x)J p 6o {x) a~* \ \p(9 l x) ) + 9 \p(6 1 x) J p(9 1 x) 

9l l ifp(a^x)Y a fpia-^Y d , (/J _, 



The condition (A.l) together with one of the conditions (A. 2) or (A. 3) (de- 
pending on the choice of 7) entails that the function (1371) is integrable. On the 
other hand (A. 2) or (A. 3) together with (A. 4) assure that the function in the 
above display is bounded. Then IF(x; T a , Pg ) as it is expressed by (ITUl) is a 
bounded function. □ 

Proof of Proposition [5] 

It holds 

\Pe {x)J p{x-9 p- 1 d9 
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and 

f p a (x) V p 6o (x) = f p(x-a) V d 
\Pe (x)J po (x) \p(x-9 )J 89 

Then the condition (I2TI) allows to conclude that IF(x; T a , Pe ), as it is expressed 
by (USD, is bounded. □ 

Proof of Proposition 

First prove that 9 n (a) - T a (P^ e x ) = o P (l). 

It holds J m'(9 , a)dP 8o = and 

m"(0„, a)dP 0o = - / ^" ( ^ j 4^o4 rfA = ( 38 ) 
^ \Po J Pe 

The matrix S 1 is symmetric and positive since tp" is positive by the convexity of 
(p. Using (C.3) in connection with the Lindeberg-Feller theorem for triangular 
arrays we have y/n f m! (9 , a)dP n = Op(l). Using (C.4) in connection with 
the Lindeberg-Feller theorem for triangular arrays yields / m"(# , a)dP n + S = 
op(1). 

Now, for any 9 = 9 +un~ 1 / 3 with \\u\\ < 1, a Taylor expansion of / m(9, a)dP n 
around 9 under (C.l) yields 



n \ m(9, a)dP n — n m(6o, ot)dP, 



= n 2/3 u l J m'{9 , a)dP n + 2" 1 n 1/3 M t / m"(9 , a)dP n u + O p (1) 
uniformly on u with < 1. Hence 

n J m(9, a)dP n -n J m(9 , a)dP n = P (n 1/6 ) - 2~ 1 n 1/3 u t Su + P (1) 
uniformly on u with ||u|| < 1. Hence uniformly on u with ||tt|| = 1, 

n J m(9, a)dP n -n J m(9 , a)dP n < P (n 1/6 ) - 2~ 1 cn 1/3 + P (1) (39) 

where c is the smallest eigenvalue of the matrix S. Note that c is positive since 
S is positive definite. In view of fl39l . by the continuity of 9 — > J m(9, a)dP n , 
it holds that as n — > oo, with probability one, 9 —> J m(9, a)dP n attains its 
maximum at some point 9 n (a) in the interior of the ball {9 : \\9 — 9 \\ < -nr 1 / 3 }, 
and therefore 

9 n {a) - 9 = o P (l). (40) 



30 



On the other hand, 



f A ( f A \ \If 2 + A 2 

T a {P^ >x ) =0o + -4lF(x; T OJ P eo ) + 4=1 + p f -4, 4= M ' 



n \ n \ \/n wn / n 



where 1 := (1, . . . , 1)* above coincides with ■^[T a (P eo+ ^)]'^ =0 by the Fisher 
consistency of the functional T a and the function p satisfies lim^oo p (^=, ^=J = 
0. Then T a (P^ £ x ) — 9 converges to zero in probability as n — > oo. Combining 
this with (j4"Ul) we obtain that 9 n (a) —T a (P,^ £ x ) converges to zero in probability. 

In the following, we prove that \/n(9 n (a) — T a (P^ £X )) = Op(l). 

By Taylor expansion, there exists 6 n inside the segment that links T a (P^ £X ) 
and 8 n (a) such that 



0= / m'(6 n (a),a)dP n 
= J m'(T a (P^ x ),a)dP n + J m\T a (P^ x ),a)dP n (9 n (a)-T a (P^ £iX )) + 
+ l -{6 n {a) -T a (P^ x )Yj m"'{e n ,a)dP n {e n {a)-T a {P^ x )). (41) 



By condition (C.l), using the sup-norm 

„ i n -i n 

|| / m'"(9 n ,a)dP n \\ = ||-£m"'(0 n ,a)(X fe )|| < 
J n k=i n fc=i 

Applying the Lindeberg- Feller theorem for triangular arrays yields / m'"(6 n , a)dP n 
P (1). Then the last term in flU) writes o P (l)(0„(a) - T a (P£ e>x )). 

Under (C.l) and (C.4), by applying a Taylor expansion and repeatedly the 
Lindeberg-Feller theorem for triangular arrays, 



r 1 n 

/ m"(T a (pr £iX ),a)dP n = - £ m"(T a (P^ x ), a, X k ) 
J n k=i 

in n 

£ m"(0 o , a, X fc ) + — m"'(0 o , «, X fc )IF(:r; T a , P 0o ) 



A ,///« v ^ ( e A\ Vs 2 + A 2 



X>'''(0 o ,a,**)l + p( 

fc=i V 



Pe Q m"(e ,a) + o P {l) 



B i 

where 1 := (1, . . . , 1)* above coincides with -^[T a (P eQ+ ^)}^ =0 by the Fisher 
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consistency of the functional T a and the function p satisfies lim^oo p f -^j - 
0. 

Therefore (jUJ) becomes 

- J m'(T a (P^ x ), a)dP n = ( J m"(9 , a)dP 9o + o P (l))(9 n (a) - T a {P^ x )). 

(42) 

We prove that y/n J m'(T a (P^ £ x ),a)dP n is Op{l). By Taylor expansion, 



r 1 n e n 

I m(T a (P^ £X ),a)dP n = — ^2m'(9 ,a,X k )-\ -= ^ m"(9 , a, X k )IF(x; T a , Pq ) + 

J n k=1 n^n fc 



A >T »r fl v M a. f c A ^ ^ + ^ 
}_^m {9 ,a,X k )l + p [—=,—= 



n\/n i v \/n \/n / // 



and therefore 



f 1 n 

\fn \ m'(T a (P^ £ J, a)<£P n = — p= ^ m'(0 Oj a, -^fe) + 

J V n fc=i 



s n A n I e A \ e 2 + A 2 

+ _ £ m «( 0Oj a> X fc )IF(x; T a , P 6o ) + - E "A^o, a, X fe )l + p — , — 4/ 



n 



k=l 



n 



k=l 



n \/n 



n 



Under (C.3) and (C.4), by applying the Lindeberg- Feller theorem for triangu- 
lar arrays it holds y/n J m'(T a (P£ EX ),a)dP n = P (l). Then from ()42l) 



V^(6 n (a)-T a (P^ x )) = Op(l). 

□ 



Proof of Proposition \7\ 

By Taylor expansion, there exists 9 n inside the segment that links T a (P^ £X ) 
and 9 n (a) such that 



(j) n (a,9 ) = j m(9 n (a),a)dP n 

= J m(T a (P^ x ),a)dP n + J m\T a (P^ x ),aYdP n (9 n ^) ~ ^(P n P £ ,J) + 
+\{9 n {a) -T a (pr £;X )yj m"{T a {P^ x ), oc)dP n {9 n {a) - T a (P^ x )) + 
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Then 

Vn(?„(a,flo) - U a (P^ x )) _ ^i(Jm(T a (P^ x ),a)dP n - U a (P^ x )) + 



[/ IF%; C/ a , P^JrfP^y)] 1 / 2 [/ IF 2 (y; U a , P^ x )dP^ x (y)} 1/2 

(Jm\T a (pP £ J,a)dP n y^(e n (a)-T a (P r ^) 
[JlF 2 (y-,U a ,P^ x )dP^ x (y)}^ 

2[JlF 2 (y;U aj P^ x )dP^ x (y)}^ 

nfiF\y-,u a ,P n p e , x )dP n p e,M} 1/2 ' 

(43) 

In the following we analyze each term in the above display. It holds 
r 1 n 

/ m(T a (P^ x ), a)dP n -U a {P^ x ) = - £{m(T a (P£ £i J, a, X k )-P^ x m(T a (P^ x ), a)}. 
J n k=i 

Apply the Lindeberg-Feller theorem for the triangular array 

Z n , k := m(T a (P^ x ),a,X k ) - P^ x m(T a (P^ x ) , a). 
For this, compute first Var(Z n fc ). Observe that 

Var(Z n , fe ) = J m 2 (T a (P^ x ),a,y)dP^ x (y) - (/ m(T a (P^ x ),a,y)dP^ x (y)^ = 
= m 2 (T a (P^ x ),a,y)dP en (y) + ^=m 2 (T a (P T ^ x ),a,x) - 

By Taylor expansions 

™( T a( P r£e,J> «, 1/) = m (#o, a, y) + -j=m'(9 , a > 2/)* IF (^ p e ) + 

v n 

+ (0 O , «, 2/ 1 + p[- r ,- r \ 

m2 ( T a{Pn,e,x), a , y) = m 2 {0 , a, y) + 2—m(6 , a, y)m'{6 , a, y)*IF(x; T a , P 9o ) + 

V n 

A , / e A \ y/e 2 + A 2 

1—=m{d^ a, y)m'(9 Q , a, yfl + p —= . 

Vn \y/n J n 

Hence the conditions (C.2) and (C.3) assure that Var(Z„ ifc ) is finite. 
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We now prove the equality 

Var(Z n , fc ) = J IF%; U a , P^ x )dP^ x (y). (44) 

By definition, 

lF(y;U a ,P^ x ) = ^[U a (P^ xty )] t=0 , 
where P^ Xfy = (1 - t)P^ x + t8 y . Also 

,x ty )i a i y) 

whence 

lF(y;U a ,pr e>x ) = -J m(T a (P^ x ),a,z)dP^ x (z) + 
+ J m'(T a (P n p £j J,a,^)c^ 
By the definition of T a (P^ £X ), it holds 

Jm'(T a (P^ x ),a,z)dP^ x (z) = 

and hence (1441) holds. Here we observe that IF(y; T Q , Pn, £ ,x) * s finite for any 
for any n and any A, since lF(y; T a , P# ) 1S - 

Thus, by Lindeberg-Feller theorem for triangular arrays, the first term in the 
expansion f|4"3]) converges in distribution to a variable jV(0, 1). 

We have J m'(T a (P^ x ),a)dP n = o P (l) since v^/ m'(T Q (P n p £j J, «)rfP n = 
Op(l) (see the proof of Proposition [6]) . Also, it holds / m" (T a (P^ e x ) , a)dP n = 

Op(l)^dJ d ^§^dP n = P (l). 

Consequently, using Proposition [6] we obtain the announced result. □ 
Proof of Proposition 
The level «o is given by 

a = 2P eo ((j) n > k n (a )) 

= 2p ( \fnQ> n - Ua(Po )) > Vn(k n (a ) - U a (P 0o )) \ 
90 \[f IF 2 (y;U a ,P eo )dP eo (y)] 1/2 ~ [f lF 2 (y;U a , P 9o )dPe (y)] 1/2 J ' 

Using the asymptotic normality of <p n in the case of uncontaminated observa- 



34 



tions (see Broniatowski and Keziou [6]), 



y/n(k n (a ) - U a (P do )) _ / _ cV. ;| , 



[fJF 2 (y;U a ,P 0o )dP eo (y)]i/2 V 2 

Therefore 

kn(a ) = U a (P 8o ) + ^- 1 (l - ^) [JlF 2 (y;U a ,P eo )dPe (y)] 1/2 + o f-U . 

(45) 

Now we are interested in the value of the asymptotic power, when the under- 
lying distribution deviates slightly from the model. Using ( H5|) 



P n ^ x = 2P? £ X ($ n > k n {a )) 



n,e,x 



[JIF 2 ( y] U a ,pr EiX )dpr £t M\ 1/2 ~ U^\y;U aj PP £ JdP n ^ x (y)} 1/2 



(T) -, h M LfiF 2 (y;^,^)rfno(y)] 1/2 m \ 



Expand U a (P^ e x ) around to U a (Pg ) to obtain 

V^(U a (P^ x ) - U a (P eo ))=elF(x; U a , Pe ) + ^[U a (P eo+ ~)h =0 + 



Using the asymptotic normality of the test statistic when the observations are 
i.i.d. with Pn ex and taking into account that 

hm [J W 2 (y;U a ,P^ x )dP^ x (y)} 1/2 = [J lF 2 (y;U a , P 6o )dPg (y)} 1/2 
it holds 



limP„„ = 2-2*[Wl-2°UA, /a^(^o + a)]a=o 



2 J [JJF 2 (y;U a ,Pe )dP eo (y)} 1/2 
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A simple calculation shows that -§AU a {P e 0+ a)]a=o = / m ( o, a, y)^-^dP do (y). 
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